Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Space-precedence does not apply to value-level operators #10597

Merged
merged 3 commits into from
Jul 24, 2024

Conversation

kazcw
Copy link
Contributor

@kazcw kazcw commented Jul 18, 2024

Pull Request Description

In a sequence of value-level operators, whitespace does not affect relative precedence. Functional operators still follow the space-precedence rules.

The "functional" operators are: >> << |> |>> <| <<| : ., application, and any operator containing <- or ->. All other operators are considered value-level operators.

Asymmetric whitespace can still be used to form operator sections of value-level operators, e.g. +2 * 3 is still equivalent to x -> (x+2) * 3.

Precedence of application is unchanged, so f x+y is still equivalent to f (x + y) and f x+y * z is still equivalent to (f (x + y)) * z.

Any attempt to use spacing to override value-level operator precedence will be caught by the new enso linter. Mixed spacing (for clarity) in value-operator expressions is allowed, as long as it is consistent with the precedences of the operators.

Closes #10366.

Important Notes

Precedence warnings:

  • The parser emits a warning if the whitespace in an expression is inconsistent with its effective precedence.
  • A new enso linter can be run with ./run libraries lint. It parses all .enso files in distribution/lib and test, and reports any errors or warnings. It can also be run on individual files: cargo run --release --bin check_syntax -- file1 file2... (the result may be easier to read than the ./run output).
  • The linter is also run as part of ./run lint, so it is checked in CI.

Additional language change:

  • The exponentiation operator (^) now has higher precedence than the multiplication class (*, /, %). This change did not affect any current enso files.

Library changes:

  • The libraries have been updated. The new warnings were used to identify all affected code; the changes themselves have not been programmatically verified (in many cases their equivalence relies on the commutativity of string concatenation).

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    TypeScript,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • Unit tests have been written where possible.

@@ -841,13 +851,13 @@ fn operator_sections() {
#[test]
fn template_functions() {
#[rustfmt::skip]
test("_.map (_+2 * 3) _*7", block![
test("_.map (_ + 2*3) _*7", block![
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics of this test have changed, because the original syntax can no longer be expressed with template functions.

@kazcw kazcw added the CI: Clean build required CI runners will be cleaned before and after this PR is built. label Jul 18, 2024
@kazcw kazcw force-pushed the wip/kw/math-precedence branch 3 times, most recently from 06a0360 to 35f3e1a Compare July 18, 2024 16:41
@kazcw kazcw self-assigned this Jul 18, 2024
@kazcw kazcw force-pushed the wip/kw/math-precedence branch 6 times, most recently from fd48f03 to 491518a Compare July 18, 2024 18:12
Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -44,7 +44,7 @@ class SimpleArithmeticTest extends InterpreterTest {
val code =
"""import Standard.Base.Data.Numbers
|main =
| expr = (10 * 3+2) - 1
| expr = 10 * (3+2) - 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a single fix in runtime-integration-tests? Good.

@@ -640,15 +640,15 @@ spec_write suite_builder suffix test_sheet_name =
parent = enso_project.data / "transient" / "nonexistent"
parent.exists.should_be_false

f = parent / "foo."+suffix
f = parent / ("foo."+suffix)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is unfortunate. It has nothing to do with arithmetic and looks really bad surrounded by ( and ).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually we will have string interpolation, right? The parser supports it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please - lets agree how we want to do that :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually we will have string interpolation, right? The parser supports it.

Interesting. I didn't know we are supposed to support string interpolation. Where's the spec? What's the syntax? What Tree will one get?

Both main x = 'x=$x' and main x = "x=$x" resulted in a simple text literal on the IR level...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/enso-org/design/blob/wip/wd/enso-spec/epics/enso-spec-1.0/04.%20Expressions.md#interpolated-text-literals
The elements returned by getElements can return a TextElement.Splice, but we flatten all of the elements into a string in TreeToIr to fit the current IR representation.
We also flatten in the GUI. When the backend supports splices, in the frontend we would probably want to display the spliced expression with widgets, but we would need to be careful that it is clear that it is part of the string expression.

@@ -50,6 +50,7 @@ exports[`Parsing '2
"spanLeftOffsetCodeStartLine": 1,
"spanLeftOffsetCodeStartUtf8": 4,
"type": "Number",
"warnings": [],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I read this correctly that warnings are propagated to TypeScript. How are the warnings propagated to CLI? I don't see any changes in Tree. How will the CLI report these warnings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is Tree.getWarnings, however they are serialized as warning IDs (plus warning-specific details), with a separate table providing the string/template to display each warning ID. I have not yet implemented the JNI to turn this in to a getMessage for the Warning object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to expose the linter via enso-engine*/bin/enso we will need a way to access the warnings from JVM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, expect a PR soon. I just wanted to get this part merged since it touches a lot of files.

const ENSO_EXTENSION: &str = ".enso";

pub fn cargo_run_linter_cmd(repo_root: &Path) -> Result<Command> {
let mut ret = Cargo.cmd()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While having such a linter may satisfy our current development needs, I believe the right linter shall be integrated into the enso command.

There is enso --compile that is being invoked as part of CI already. We don't need a new CI task (in my opinion), but rather we should make sure that enso --compile (maybe with --lint) fails the CI when the formatting isn't correct.

Copy link
Contributor Author

@kazcw kazcw Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important for the compiler to be able to emit syntax warnings. I think it's also useful to have a simple standalone linter--when linting fails in CI, it's nice if it happens right away, not after building the engine. It isn't a new CI task; it's part of ./run lint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's also useful to have a simple standalone linter

Indeed. However such a linter has to be available to Enso users, not just to us. E.g. it cannot be part of ./run lint. It has to be part of enso-engine*/bin/enso.

How will we expose the linter to Enso end users? What needs to be changed to make that happen? Can you report an issue describing what needs to be done? Can you implement it (and remove current ./run lint temporary solution) or shall JVM guys in engine team do it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standalone linter is only for us, to support our CI process. It can provide results before the engine is finished building. For that purpose, it cannot be part of enso-engine*/bin/enso. End users will receive warnings from enso-engine*/bin/enso.

@kazcw
Copy link
Contributor Author

kazcw commented Jul 19, 2024

  • I'd rather see linting integrated into enso CLI interface

We'll do that too. The separate linter came for free (I needed to write it to test the warnings and update the libs), and was needed for CI.

* when at the spacing issues, don't you want to fix [Expressions not being run if indented by 1 more space than current indentation level #9419](https://github.com/enso-org/enso/issues/9419)?

Line indentation and spacing between subexpressions are handled in different parts of the parser (block-building happens long before precedence-resolution).

Copy link
Member

@jdunkerley jdunkerley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the libs changes.

@JaroslavTulach JaroslavTulach self-requested a review July 22, 2024 15:55
Comment on lines +18 to +34
fn main() {
let args = std::env::args().skip(1);
let parser = enso_parser::Parser::new();
let parse_time: std::time::Duration = args
.map(|path| {
let code = std::fs::read_to_string(path).unwrap();
let mut code = code.as_str();
if let Some((_meta, code_)) = enso_parser::metadata::parse(code) {
code = code_;
}
let start = std::time::Instant::now();
std::hint::black_box(parser.run(code));
start.elapsed()
})
.sum();
println!("Parse time: {} ms", parse_time.as_millis());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, we could use standard rust benchmark tests, as we still use nightly rust. But it may be done later, of course.

#[derive(Clone, Debug, Eq, PartialEq, Serialize, Reflect, Deserialize)]
#[allow(missing_copy_implementations)] // Future warnings may have attached information.
pub struct Warning {
id: u32,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we don't use just WarningId here (And make WarningId #[repr(u32)])?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This representation is designed for FFI. If we used the parser in Rust I would use an enum with parameters, with serialization overridden to produce this representation. Since the Rust API is only used by debug tools, I cut out the Rust-friendly type.

Comment on lines +797 to +800
/// Template strings for printing warnings.
// These must be defined in the same order as the [`WarningId`] variants.
pub const WARNINGS: [&str; WarningId::NUM_WARNINGS as usize] =
["Spacing is inconsistent with operator precedence"];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a function with a single match? It would not require NUM_WARNINGS, and is easier to read (no longer need to care about proper order), and - if written properly - will optimize to array of &str anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to send the table across the FFI boundary, and then stringify warnings in the GC'd languages. This avoids FFI calls, encoding conversion, and allocations when emitting warnings. In the future we might move the warning table to a JSON file (which could be passed to some codegen) to further optimize this process and also make the messages easier to edit (or even localize).

pub struct ColdVec<T> {
#[allow(clippy::box_collection)]
#[reflect(as = "Vec<T>")]
elements: Option<Box<Vec<T>>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why box?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is how this struct fulfills its purpose. It has a shallow size of 1 pointer, and when unused that is its total size. A Vec has a shallow size of 3 pointers, even if unused.

@kazcw kazcw added the CI: Ready to merge This PR is eligible for automatic merge label Jul 23, 2024
@kazcw kazcw mentioned this pull request Jul 24, 2024
4 tasks
@jdunkerley jdunkerley added the CI: Keep up to date Automatically update this PR to the latest develop. label Jul 24, 2024
@jdunkerley jdunkerley removed the CI: Keep up to date Automatically update this PR to the latest develop. label Jul 24, 2024
@mergify mergify bot merged commit e5b85bf into develop Jul 24, 2024
42 checks passed
@mergify mergify bot deleted the wip/kw/math-precedence branch July 24, 2024 10:55
jdunkerley pushed a commit that referenced this pull request Jul 24, 2024
In a sequence of value-level operators, whitespace does not affect relative precedence. Functional operators still follow the space-precedence rules.

The "functional" operators are: `>> << |> |>> <| <<| : .`, application, and any operator containing `<-` or `->`. All other operators are considered value-level operators.

Asymmetric whitespace can still be used to form *operator sections* of value-level operators, e.g. `+2 * 3` is still equivalent to `x -> (x+2) * 3`.

Precedence of application is unchanged, so `f x+y` is still equivalent to `f (x + y)` and `f x+y * z` is still equivalent to `(f (x + y)) * z`.

Any attempt to use spacing to override value-level operator precedence will be caught by the new enso linter. Mixed spacing (for clarity) in value-operator expressions is allowed, as long as it is consistent with the precedences of the operators.

Closes #10366.

# Important Notes
Precedence warnings:
- The parser emits a warning if the whitespace in an expression is inconsistent with its effective precedence.
- A new enso linter can be run with `./run libraries lint`. It parses all `.enso` files in `distribution/lib` and `test`, and reports any errors or warnings. It can also be run on individual files: `cargo run --release --bin check_syntax -- file1 file2...` (the result may be easier to read than the `./run` output).
- The linter is also run as part of `./run lint`, so it is checked in CI.

Additional language change:
- The exponentiation operator (`^`) now has higher precedence than the multiplication class (`*`, `/`, `%`). This change did not affect any current enso files.

Library changes:
- The libraries have been updated. The new warnings were used to identify all affected code; the changes themselves have not been programmatically verified (in many cases their equivalence relies on the commutativity of string concatenation).

(cherry picked from commit e5b85bf)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Clean build required CI runners will be cleaned before and after this PR is built. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Changing Space Operator Precedence For Maths
4 participants